❗ Before Start to Learn

Mindset is everything because our mindset will be the guiding force behind our decisions and the actions we have to take.

You must believe that you can do it, you can learn data science as well!

📖 What is Programming?

Before we dive deeper, let’s watch this video!

Code Stars

Programming means writing instructions for a computer to perform desired actions or tasks.

For writing instruction, we need a language, as humans talk to each other. Because of the computer can’t understand our natural language—such as English, Bahasa Indonesia—we need to communicate with the computer using a language that the computer understands, programming language.

💻 Programming Language

A programming language is a set of commands, instructions, and other syntax use to create a software program. The problem that programming languages solve is computers only understand 0s and 1s, but humans do not understand 0s and 1s. So, a programming language is an intermediary between a computer and a programmer.

There are hundreds of developing programming languages with various uses. The majority of popular programming languages are high-level languages (which are easy for humans to understand). Some of them, namely Python and R. We will learn more in Algoritma Data Science School course.

🔍Basic Programming

Variable

A variable is a place to store data, while a data type is a type of data stored in a variable.

So if we say, data is food, then variable is where we store the food.

R

Use the assignment operator <- to create new variables.

x <- 5
x
## [1] 5

Python

Use the assignment operator = to create new variables.

print("Hello Python!")
## Hello Python!
x = 5
x
## 5

If-Else Statement

This is one of the Decision-making statements in the programming language. It is one of the easiest decision-making statements.

R

x <- 5
if (x > 0) {
  print('x is positive')
} else {
  print('x is negative')
}
## [1] "x is positive"
print('This cell execute after if-else statement')
## [1] "This cell execute after if-else statement"

Python

x = -5
if (x > 0):
  print('x is positive')
else:
  print('x is negative')
## x is negative
print('This cell execute after if-else statement')
## This cell execute after if-else statement

Vector (R) / List (Python)

This vector in R is a place to store values for elements that have the same class. But, if we want to store elements or components that have different classes and lengths, we can use list both in R or Python.

R

In R, if we want to store elements or components that have different classes and lengths, we can use list, but if the classes of the element that we want to store is same, instead of list() we can use concate function or c() or we can call it as vector.

Vector

my_vector <- c("a", "b", "c") 
print(my_vector)
## [1] "a" "b" "c"

List

my_list <- list("apple", 1, "cherry", my_vector)
my_list[1]
## [[1]]
## [1] "apple"
my_list[length(my_list)] # That means printing the last element
## [[1]]
## [1] "a" "b" "c"
my_list[-1] # That means excluding the first element to print
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "cherry"
## 
## [[3]]
## [1] "a" "b" "c"

Python List in Python can be used for storing elements or components that have different classes and lengths. Python uses zero-based indexing. That means, the first element(value ‘red’) has an index 0, the second(value ‘green’) has index 1, and so on. Negative indexing in Python means the indexing starts from the end of the iterable.

my_list = ["apple", 1, "cherry", ["a", "b", "c"]]
my_list[0]
## 'apple'
my_list[-1] # That means printing the last element
## ['a', 'b', 'c']
my_list[1:] # That means excluding the first element to print
## [1, 'cherry', ['a', 'b', 'c']]

Loop (For)

The for loop is used to iterate over a sequence (list) or other iterable objects.

R

for (i in c("apple", "banana", "cherry"))
{
    print(i)
}
## [1] "apple"
## [1] "banana"
## [1] "cherry"

Using list()

for (i in list("apple", 1, "cherry"))
{
    print(i)
}
## [1] "apple"
## [1] 1
## [1] "cherry"

Python

my_list = ["apple", 1, "cherry"]
for i in my_list:
  print(i)
## apple
## 1
## cherry

DataFrame

DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet.

R For creating a DataFrame in R, we can use data.frame()

iklan <- data.frame(Channel=c("Youtube", "Instagram", "Facebook", "Twitter"), budget=c(8.0, 4.5, 4.3, 2.5))
iklan

Get the Structure of DataFrame

One can get the structure of the data frame using str() function in R.

str(iklan)
## 'data.frame':    4 obs. of  2 variables:
##  $ Channel: chr  "Youtube" "Instagram" "Facebook" "Twitter"
##  $ budget : num  8 4.5 4.3 2.5

Summary of DataFrame

In R data frame, the statistical summary and nature of the data can be obtained by applying summary() function.

summary(iklan)
##    Channel              budget     
##  Length:4           Min.   :2.500  
##  Class :character   1st Qu.:3.850  
##  Mode  :character   Median :4.400  
##                     Mean   :4.825  
##                     3rd Qu.:5.375  
##                     Max.   :8.000

Extract Data from DataFrame

Extract data from a data frame means that to access its rows or columns. One can extract a specific column from a data frame using its column name.

iklan$Channel
## [1] "Youtube"   "Instagram" "Facebook"  "Twitter"

Expand DataFrame

A data frame in R can be expanded by adding new columns and rows to the already existing data frame.

iklan$Color <- c("Red", "Purple", "Blue", "Soft Blue")
iklan

Python For creating a DataFrame in Python, we need pandas library. So first of all we need import that library like this one.

import pandas as pd
iklan = pd.DataFrame({
  "Channel":["Youtube", "Instagram", "Facebook", "Twitter"],
  "budget":[8.0, 4.5, 4.3, 2.5],})
print(iklan)
##      Channel  budget
## 0    Youtube     8.0
## 1  Instagram     4.5
## 2   Facebook     4.3
## 3    Twitter     2.5

Get the Structure of DataFrame

One can get the structure of the data frame using .info() function in Python pandas.DataFrame.

iklan.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 4 entries, 0 to 3
## Data columns (total 2 columns):
##  #   Column   Non-Null Count  Dtype  
## ---  ------   --------------  -----  
##  0   Channel  4 non-null      object 
##  1   budget   4 non-null      float64
## dtypes: float64(1), object(1)
## memory usage: 192.0+ bytes

Summary of DataFrame

In Python pandas.DataFrame, the statistical summary and nature of the data can be obtained by applying .describe() function.

iklan.describe()
##          budget
## count  4.000000
## mean   4.825000
## std    2.299819
## min    2.500000
## 25%    3.850000
## 50%    4.400000
## 75%    5.375000
## max    8.000000

Extract Data from DataFrame

One can extract also a specific column from a data frame using its column name in Python pandas.DataFrame.

iklan["Channel"]
## 0      Youtube
## 1    Instagram
## 2     Facebook
## 3      Twitter
## Name: Channel, dtype: object
iklan.Channel
## 0      Youtube
## 1    Instagram
## 2     Facebook
## 3      Twitter
## Name: Channel, dtype: object

Expand DataFrame

A Python pandas.DataFrame can be expanded by adding new columns and rows to the already existing data frame.

iklan["Color"] = ["Red", "Purple", "Blue", "Soft Blue"]
iklan
##      Channel  budget      Color
## 0    Youtube     8.0        Red
## 1  Instagram     4.5     Purple
## 2   Facebook     4.3       Blue
## 3    Twitter     2.5  Soft Blue